Computational Analysis of Medieval Manuscripts: A New Tool for Analysis and Mapping of Medieval Documents to Modern Orthography

نویسندگان

  • Mushtaq Ahmad
  • Stefan Gruner
  • Muhammad Tanvir Afzal
چکیده

Medieval manuscripts or other written documents from that period contain valuable information about people, religion, and politics of the medieval period, making the study of medieval documents a necessary pre-requisite to gaining in-depth knowledge of medieval history. Although tool-less study of such documents is possible and has been ongoing for centuries, much subtle information remains locked such manuscripts unless it gets revealed by effective means of computational analysis. Automatic analysis of medieval manuscripts is a non-trivial task mainly due to non-conforming styles, spelling peculiarities, or lack of relational structures (hyper-links), which could be used to answer meaningful queries. Natural Language Processing (NLP) tools and algorithms are used to carry out computational analysis of text data. However due to high percentage of spelling variations in medieval manuscripts, NLP tools and algorithms cannot be applied directly for computational analysis. If the spelling variations are mapped to standard dictionary words, then application of standard NLP tools and algorithms becomes possible. In this paper we describe a web-based software tool CAMM (Computational Analysis of Medieval Manuscripts) that maps medieval spelling variations to a modern German dictionary. Here we describe the steps taken to acquire, reformat, and analyze data, produce putative mappings as well as the steps taken to evaluate the findings. At the time of the writing of this paper, CAMM provides access to 11275 manuscripts organized into 54 collections containing a total of 242446 distinctly spelled words. CAMM accurately corrects spelling of 55% percent of the verifiable words. CAMM is freely available at http://researchworks.cs.athabascau.ca/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Against the Current: Farid al-Din ‘Attar’s Diverse Voices

Love and its transformative power have long been at the center of Islamic Sufism. For Sufi writers profane love, perceived as the love of worldly beloved, was the first step on the path toward the union with the divine. Farid al-Din ‘Attar (1145-1221) was one of the most significant authors to espouse and articulate profane love as a representation of both earthly and heavenly love. 'Attar’s us...

متن کامل

Medieval Manuscripts, Hypertext and Reading. Visions of Digital Editions

How was a medieval manuscript meant to be read? This is a question that has concerned me for a long time in my work with Old Swedish manuscripts from Vadstena Abbey. In many manuscripts we can find traces of the historical reading situation; for example, pointing hands, marginal notes, etc. Such signals had an important function for the medieval reader, but they are rarely put forward in modern...

متن کامل

Symbol Classification Approach for OMR of Square Notation Manuscripts

Researchers in the field of OMR (Optical Music Recognition) have acknowledged that the automatic transcription of medieval musical manuscripts is still an open problem [2, 3], mainly due to lack of standards in notation and the physical quality of the documents. Nonetheless, the amount of medieval musical manuscripts is so vast that the consensus seems to be that OMR can be a vital tool to help...

متن کامل

Exploring traditions

After the boom of pharmacological research during the 1950s, mainly as a result of the screening technique, the identification of active molecules, their reshaping by means of pharmaco-chemical drug design, and the creation of extremely efficacious and successful medicines, the pharmaceutical world sought a new source of inspiration. Such biota as the tropical forest, a quantity of plants credi...

متن کامل

Exploring traditions

After the boom of pharmacological research during the 1950s, mainly as a result of the screening technique, the identification of active molecules, their reshaping by means of pharmaco-chemical drug design, and the creation of extremely efficacious and successful medicines, the pharmaceutical world sought a new source of inspiration. Such biota as the tropical forest, a quantity of plants credi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. UCS

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2012